Mining Patents with tmChem, GNormPlus and an Ensemble of Open Systems
نویسندگان
چکیده
The significant amount of medicinal chemistry information contained in patents make them an attractive target for text mining. The CHEMDNER task at BioCreative V focused on information extraction from patents. This manuscript describes our submissions to the CEMP (chemical named entity recognition) and GPRO (gene and related object identification) subtasks. Our CEMP submission is an ensemble of five open systems, including both versions of tmChem, our previous work on chemical named entity recognition. Their output is combined using a machine learning classification approach. Our CEMP system obtained 0.8752 precision and 0.9129 recall, for 0.8937 f-score. Our submission to the GPRO task is an extension of GNormPlus, our previous work for gene and protein named entity recognition. Our GPRO system obtained a performance of 0.8143 precision and 0.8141 recall for 0.8137 f-score. Both submissions achieved the highest performance in their respective tasks.
منابع مشابه
Beyond accuracy: creating interoperable and scalable text-mining web services
UNLABELLED The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most te...
متن کاملPatent mining: combining dictionary-based and machine-learning approaches
Exploration of the chemical patent space is essential for early-stage medicinal chemistry activities. The BioCreative CHEMDNER-patents task focuses on the recognition of chemical compounds in patents. This includes recognition of chemical named entities in patents (CEMP), classification of chemical-related patent titles and abstracts (CPD), and recognition of genes and proteins in patent abstra...
متن کاملChemical entity recognition in patents by combining dictionary-based and statistical approaches
We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistic...
متن کاملGNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains
The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications. Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene n...
متن کاملtmChem: a high performance approach for chemical named entity recognition and normalization
Chemical compounds and drugs are an important class of entities in biomedical research with great potential in a wide range of applications, including clinical medicine. Locating chemical named entities in the literature is a useful step in chemical text mining pipelines for identifying the chemical mentions, their properties, and their relationships as discussed in the literature. We introduce...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015